Categories

Versions

Optimize Threshold (Subprocess) (Operator Toolbox)

Synopsis

This operator evaluates different thresholds and delivers a model with the best threshold attached.

Description

When solving a classification task one derives confidences for each class. By default RapidMiner assigns the prediction to the class with the highest confidence. Therefore, in a binominal classification the prediction is set to the class which has a confidence larger than 0.5. This operator automatically applies thresholds between 0 and 1.0 to receive the threshold which results in the highest performance. The performance measure is calculated using the inner process.

Input

  • exa (Data Table)

    The ExampleSet you want the thresholds to be optimized on.

  • mod (Model)

    The model for scoring. It is not mandatory to provide a model, but recommended to get a GroupedModel as result. If you do not provide a model the ExampleSet at the exa input port needs to be scored already (have a prediction attribute).

Output

  • exa (Data Table)

    The scored example set with the best thresholds applied.

  • mod (Model)

    If you provide a model as an input, you will receive a GroupedModel with the original Model and the Threshold Model combined. You can use this to score new data and apply the threshold at the same time. If you do not provide a model you will only get the threshold model. This threshold model can be applied using Apply Model.

  • per (Performance Vector)

    The performance of the best threshold.

Tutorial Processes

Optimizing Threshold for F1-Score

In this process we use Optimize Threshold (Subprocess) to find the threshold, which yields the best F1-Score. For this we first train a Generalized Linear Model. Performance is evaluated with the Performance (Binominal Classification) operator.